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Summary 

We present asymptotic and finite-sample results on the use of stochastic blockmodels for the 
analysis of network data. We show that the fraction of misclassified network nodes converges 
in probability to zero under maximum likelihood fitting when the number of classes is allowed 
to grow as the root of the network size and the average network degree grows at least poly- 
logarithmically in this size. We also establish finite-sample confidence bounds on maximum- 
likelihood blockmodel parameter estimates from data comprising independent Bernoulli random 
variates; these results hold uniformly over class assignment. We provide simulations verifying 
the conditions sufficient for our results, and conclude by fitting a logit parameterization of a 
stochastic blockmodel with covariates to a network data example comprising a collection of 
Facebook profiles, resulting in block estimates that reveal residual structure. 

Some key words: Likelihood-based inference; Social network analysis; Sparse random graph; Stochastic blockmodel. 



1. Introduction 

The global structure of social, biological, and information networks is sometimes envisioned 
as the aggregate of many local interactions whose effects propagate in ways that are not yet 
well understood. There is increasing opportunity t o collect data on an app ropriate scale for such 
systems, but their analysis remains challenging dGoldenberg et al. , 2009). Here we analyze a 
statistical model for network data known as the (single-membership) stochastic blockmodel. 
Its salient feature is that it partitions the N nodes of a network into K distinct classes whose 
members all interact similarly with the network. Blockmodels were first as sociated with the 
determ inistic concept of structural equivalence in social network analysis (ILorrain & White!. 
19711) . where two nodes were considered interchangeable if their connections were equiva- 



lent in a formal sense. This conce pt was adapt e d to s toch astic settings a n d gav e rise to the 
stochastic blockmodel in work by Holland et all (119831) and iFienberg et al.1 (11985b . The model 
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and extensions thereof have since been applied in a variety of disciplines (Wang & W ong. 



1987; Nowicki & Sniiders, 2001; Girvan & Newman, 2002; Airoldi et al., 2005; Doreian et al 



2005l:lNewman. 2006; Handcock et all 120071: iHofA 120081: 1 Airoldi et al. 



2008: lCopic et all. 2009 



iMariadassou et all |201Qj; iKarrer & NewmanL 1201 lh . 

In this work we provide a finite-sample confidence bound that can be used when estimating 
network structure from data modeled by independent Bernoulli random variates, and also show 
that under maximum likelihood fitting of a correctly specified if -class blockmodel, the fraction 
of misclassified network node s converges i n pr obability to zero even when the number of classes 
K grows with N. As noted by lRohe et all (|201lh . this is advantageous if we expect class sizes to 
remain relatively constant even as increases. Related results for fixed K have been shown by 
ISnijders & Nowickil (1 19971) for networks with linearly increasing degree, and in a strong er sense 
for sparse graphs with poly-logarithmical ly increasing deg ree by iBickel & CherJ d2009h . 

Our results can be related to those of iRohe et al.1 (|201lh . who use spectral methods to bound 
the number of misclassified nodes in the stochastic blockmodel with increasing K, although 
with the more restrictive requirement of nearly linearly increasing degree. As noted by those 
authors, this assumption may not hold in many practical settings. Our manner of proof requires 
on ly poly-logarithmicall y increasing degree, and is more closely related to the fixe d-if proof 



of Bi ckel & Chenl (120091) . although we note that spectral clustering as suggested by Rohe et al 



(1201 II ) provides a computationally appealing alternative to maximum likelihood fitting in prac- 
tice. 

As discussed by IBickel & Chenl d2009h . one may assume exchangeability in lieu of a genera- 
tive if -class blockmodel: An analogue to de Finetti's theorem for exchangeable sequences states 
that the probability distribution of an infinite exchangeable random graph is ex pressible as a mix- 
ture of distribut i ons w hose components can be approximated by blockmodels (IKallenbergl . 120051 : 



Bickel & Chenl. l2009i) . An observed network can then be viewed as a sample drawn from this 
infinite conceptual population, and so in this case the fitted blockmodel describes one mixture 
component thereof. 



2. Statement of results 
2-1. Problem formulation and definitions 
We consider likelihood-based inference for independent Bernoulli data {Aij} (i = 
1, . . . ,N;j = i + 1, . . . , N), both when no structure linking the success probabilities {Pij} is 
assumed, as well as the special case when a stochastic blockmodel of known order K is as- 
sumed to apply. To this end, let A £ {0, l} NxN denote the symmetric adjacency matrix of a 
simple, undirected graph on N nodes whose entries {A^ } for i < j are assumed independent 
Bernoulli(Pjj) random variates, and whose main diagonal {Aa}f =l is fixed to zero. The average 
degree of this graph is 2M/N, where M = J2i<j Pij * s i ts expected number of edges. Under a 
if -class stochastic blockmodel, these edge probabilities are further restricted to satisfy 



,N;j = i + l,...,N) 



(1) 



for some symmetric matrix 9 € [0, \\ KxK and membership vector 2 € {1, ... , K} N . Thus the 
probability of an edge between two nodes is assumed to depend only on the class of each node. 
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L(A; z,9) = J2 {Aj log 9 ZiZj + (1 - A {j ) log(l - 8 ZiZ .)} 



For fixed class assignment z, let iV a denote the number of nodes assigned to class a, and let 
n a b denote the maximum number of possible edges between classes a and b; i.e., n a i, = N a Ni, if 
a ^ b and n aa = (^ a ). Further, let 9^ and #W be symmetric matrices in [0, 1]^*^, with 



97 Let L(^4; 2, #) denote the log-likelihood of observing data matrix A under a A'-class block- 

98 model with parameters (z, 6), and Lp(z, 6) its expectation: 
99 

100 
101 

102 M*, 0) = V {Py logfl*^ + (1 - P,) log(l - 9 ZiZ .)} 

103 ^ 
104 
105 
106 
107 

0^ = — ^ l{zi = a, = 6} (a = 1,..., If; 6 = a,..., if), 

110 4<J 

111 8ab=— J2P ij l{z i = a,z j = b} (a = l,...,K;b = a,...,K) 

112 n ^f^ 

114 defined whenever n a i, f= 0. Observe that 9 (z) comprises sample proportion estimators as a func- 

115 tion of z, whereas 9^ is its expectation under the independent {Bernoulli(Py)} model. Taken 

116 over all class assignments z € {1, . . . ,K} N , the sets comprise a sufficient statistic for 

117 the family of if-class stochastic blockmodels, and for each z, 9^ z > maximizes L(A; z, •). Analo- 

118 gously, the sets {9^} are functions of the model parameters {Pij}i<j, and maximize Lp(z, ■). 

119 We write 9 and 9 when the choice of z is understood, and L(A; z) and Lp(z) to abbreviate 

120 SU p e L(A; z, 9) and sup^ Lp(z, 9) respectively. 

121 Finally, observe that when a blockmodel with parameters (z, 9) is in force, then P^ = in 

122 accordance with £T|), and consequently Lp is maximized by the true parameter values (z, 9): 
123 

124 L P (z,9) - L P (z,9) =J2D(P» II <W > £ 2 (P i - ^.) 2 > 0, 

125 i<j i<3 

126 
127 
128 

129 2-2. Fitting a K-class stochastic blockmodel to independent Bernoulli trials 

^ Fitting a ET-class stochastic blockmodel to independent Bernoulli(Pij) trials yields estimates 

6& of averages 9^ z ' of subsets of the parameter set {Pij}, with each class assignment z inducing 
a partition of that set. We begin with a basic lemma that expresses the difference L(A; z) — 
Lp(z) in terms of 9^ and 9^ z \ and follows directly from their respective maximizing properties. 



where D(p \ \ p') denotes the Kullback-Leibler divergence of a Bernoulli^') distribution from 
a Bernoulli (p) one. 
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133 
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135 Lemma 1. Let {Aij}i<j comprise independent BernouHi(Py) trials. Then the difference 

136 sup e L(A; z, 9) - sup e L P (z, 9) can be expressed for X = ^2 iKj Aij log{9 ZiZj /(l - 9 ZiZj )} as 
137 

138 L{A; z) - L P {z) = Z a <b n ab D(9 ab || 9 ab ) + X - E(X). 

1 39 

We first bound the former quantity in this expression, which provides a measure of the distance 
between 9 and its estimand 9 under the setting of Lemma Q] The bound is used in subsequent 
asymptotic results, and also yields a kind of confidence measure on 9 in the finite-sample regime. 

143 THEOREM 1. Suppose that a K-class stochastic blockmodel is fitted to data {A{j}i<j com- 

144 prising ( 2 ) independent Bernoulli (Pij) trials, where, for any class assignment z, estimate 9 
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145 maximizes the blockmodel log-likelihood L(A; z, •). Then with probability at least 1 — 5, 
146 

147 max{Z a <b n abD(9 ab \\e ab )} < NlogK + (K 2 + K)log + l) + log ^. (2) 

148 

249 Theorem[T]is proved in the Appendix via the method of types: for fixed z, the probability of 

150 any realization of 9 is first bounded by exp{— ^2 a < b n ab D(9 ab \ \ 9 ab )}. A counting argument 

151 then yields a deviation result in terms of (N/ K + 1) r2+k , and finally a union bound is applied 

152 so that the result holds uniformly over all K N possible choices of assignment vector z. 

153 Our second result is asymptotic, and combines Theorem Q] with a Bernstein inequality for 

154 bounded random variables, applied to the latter terms X — E(X) in LemmaQ] To ensure bound- 

155 edness we assume minimal restrictions on each P^; this Bernstein inequality, coupled with a 

156 union bound to ensure that the result holds uniformly over all z, dictates growth restrictions on 

157 K and M. 
158 



THEOREM 2. Assume the setting of Theorem [7J whereby a K -class blockmodel is fitted to 
( 2 ) independent Bernoulli {Pij) random variates {Aij}i<j, and further assume that l/N 2 < 
161 Pij <l-l/N 2 for all N and i < j. Then if K = 0(N 1 / 2 ) and M = uj{N (log N) 3+s ) for 

some 5 > 0, 

163 m&x\L(A;z) - L P (z)\= o P (M). 

164 

165 Thus whenever each P^ is bounded away from and 1 in the manner above, the maximized log- 

166 likelihood function L(A; z) = sup L(A; z, 9) is asymptotically well behaved in network size N 

167 as long as the network's average degree 2M/N grows faster than (log N) 3+s and the number K 

168 of classes fitted to it grows no faster than N 1 / 2 . 
169 

170 2-3. Fitting a correctly specified K-class stochastic blockmodel 

171 The above results apply to the general case of independent Bernoulli data {Aij}, with no addi- 

172 tional structure assumed amongst the set of success probabilities {Pij}; if we further assume the 

173 data to be generated by a if-class stochastic blockmodel whose parameters (z, 9) are subject to 

174 suitable identifiability conditions, it is possible to characterize the behavior of the class assign- 

175 ment estimator z under maximum likelihood fitting of a correctly specified if-class blockmodel. 

THEOREM 3. If the conclusion max z \L(A;z) — Lp(z)\ = op(M) of Theorem\2\holds, and 
data are generated according to a K-class blockmodel with membership vector z, then 

179 L P (z)-Lp{z) = op(M), (3) 



with respect to the maximum-likelihood K-class blockmodel class assignment estimator z. 

Let N e (z) be the number of incorrect class assignments under z, counted for every node 
whose true class under z is not in the majority within its estimated class under z. If furthermore 
the following identifiability conditions hold with respect to the model sequence: 

(i) for all blockmodel classes a = 1, . . . , K, class size N a grows as mm a {N a } = Vt(N/K); 

(ii) the following holds over all distinct class pairs (a, b) and all classes c: 



mm max < D\9 ac + D \9 bc > = iZ — — , 

188 ( a ,b) c I V 11 2 J V 11 2 J) \ N 2 J 

189 

then it follows from ([3]) that N c (z) = op(N). 

191 Thus the conclusion of Theorem [3] is that under suitable conditions the fraction N c /N of 

192 misclassified nodes goes to zero in N, yielding a convergence result for stochastic blockmodels 
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193 with growing number of classes. Condition (i) stipulates that all class sizes grow a rate that is 

194 eventually bounded below by a single constant times N/K, while condition (ii) ensures that 

195 any two rows of 9 differ in at least one entry by an amount that is eventually bounded by a 

196 single constant times MK/N 2 . Observe that if eventually K = N 1 / 2 and M = AT (log iV) 4 so 

197 that conditions on K and M sufficient for Theorem [2] are met, then since (log iV) 4 = o(N 1 / 2 ), 

198 it follows that MK/N 2 goes to zero in N. 
199 

200 

3. Numerical results 

203 We now present results of a small simulation study undertaken to investigate the assumptions 

204 and conditions of Theorems [T]-[3] above, in which if-class blockmodels were fitted to various 

205 networks generated at random from models corresponding to each of the three theorems. Be- 

206 cause exact maximization in z of the blockmodel log-likelihood L(A; z, 9) is computationally 

207 intractable even for moderate N, we instead employed Gibbs sampling to explore the function 

208 maxe L(A; z, 9) and recorded the best value of z visited by the sampler. As the results of Theo- 

209 remsQ] and [2] hold uniformly in z, however, we expect 9 and Lp(z) to be close to their empirical 
2\ o estimates whenever N is sufficiently large, regardless of the approach employed to select z. This 
2\ i fact also suggests that a single-class (Erdos-Renyi) blockmodel may come closest to achieving 
2\2 equality in Theorems Q] and 12 as many class assignments are equally likely a priori to have high 

223 likelihood. By similar reasoning, a weakly identifiable model should come closest to achieving 

224 the error bound in Theorem[3l such as one with nearly identical within- and between-class edge 

225 probabilities. We describe each of these cases empirically in the remainder of this section. 

226 First, the tightness of the confidence bound of (O from Theorem Q] was investigated by fit- 

227 ting If -class blockmodels to Erdos-Renyi networks comprising ( 2 ) independent Bernoulli(p) 

228 trials, with N = 500 nodes and p = 0-075 chosen to match the data analysis example in the 

229 sequel, and K G {5, 10, 20, 30, 40, 50}. For each K, the error terms J2 a <b n abD(9 a b 1 1 9 a b) an d 

220 {S a <6 n ab(9 a b — #ab) 2 } 1//2 were recorded for each of 100 trials and compared to the respective 

221 95% confidence bounds (S = 0-05) derived from Theorem Q] The bounds overestimated the re- 

222 spective errors by a factor of 3 to 7 on average, with small standard deviation. In this worst-case 

223 scenario the bound is loose, but not unusable; the errors never exceeded the 95% confidence 

224 bounds in any of the trials. 

225 To test whether the assumptions of Theorem |2] are necessary as well as sufficient to obtain 

226 convergence of L(A; z) JM to Lp[z) /M, blockmodels were next fitted to Erdos-Renyi networks 

227 of increasing size, for N in the range 50-1050. The corresponding normalized log-likelihood 

228 error \L(A; z) — Lp(z)\/M for different rates of growth in the expected number of edges M 

229 and the number of fitted classes K is shown in Fig. CD Observe from the leftmost panel that when 

230 M = iV(log iV) 4 and K = N 1 / 2 , as prescribed by the theorem, this error decreases in N. If the 

231 edge density is reduced to M/N = (log iV) 2 , we observe in the center panel convergence when 

232 K = N 1 / 2 and diverg ence when K = iV 3 / 5 . This suggests that the error as a function of K 

233 follows Theorem [2] closely, but that the network can be somewhat more sparse than it requires. 

234 To test the conditions of Theorem [3l blockmodels with parameters (z,9) and increasing class 

235 size K were used to generate data, and corresponding node misclassification error rates N e (z) /N 

236 were recorded as a function of correctly specified if-class blockmodel fitting. Model parameter z 

237 was chosen to yield equally-sized blocks, so as to meet identifiability condition (i) of Theorem[3] 

238 Parameter 9 = al + /311 T was chosen to yield within-class and between-class success proba- 

239 bilities with the property that for any class pair (a, b), the condition D(9 aa \ \ (9 aa + 6 a b)/2) = 

240 MK" 1 /(20N 2 ) was satisfied, with 7 G {4/5, 9/10, 1}; identifiability condition (ii) was thus met 
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Fig. 1. Simulation study results illustrating Theorems[T}[3] Left: Likelihood error \L(A;z) — Lp(z)\/M as a function 
of network size N, shown for M = AT(log N) 4 with K = N 1/2 . Center: Same quantity for M = AT (log N) 2 with 
K = N 3/5 (dotted) and K = N 1/2 (solid). Right: Error rate N c {z)/N for M = AT (log N) 2 with K = N 1/2 and 

7 = 4/5 (dotted), 7 = 9/10 (dashed), 7 = 1 (solid) 



only in the 7 = 1 case. The rightmost panel of Fig.[T]shows the fraction N e (z) /N of misclassified 
nodes when M = iV(log iV) 2 and K = N 1 / 2 , corresponding to the setting in which convergence 
of L(A;z)/M to Lp(z)/M was observed above; this fraction is seen to decay when 7 = 1 or 
9/10, but to increase when 7 = 4/5. This behavior conforms with Theorem [3] and suggests that 
its identifiability conditions are close to being necessary as well as sufficient. 



4. Network data example 
4-1. Facebook social network dataset 
To illustrate the use of our results in the fitting of /T-class stochastic blockmod- 
els to network data, we employed a publicly available social network dataset contain- 
ing N = 553 undergraduate Facebook profiles from the California Institute of Technology 
(people.maths.ox.ac.uk/~porterm/data/facebook5.zip). These profiles indicate whenever a pair 
of students have identified one another as friends, yielding a network of 11 511 edges and ac- 
c ompanying c o yariat e information including gender, class year, and hall of residence. 



Traud et all (|201lf) applied community detection algorithms to this network, and compared 



their output to partitions based on categorical covariates such as those identified above. They 
concludes that a grouping of students by residence hall was most similar to the best algorithmic 
grouping obtained, and thus that shared residence hall membership was the best predictor for 
the formation of community structure. This structure is reflected in the leftmost panel of Fig. 12 
which shows the network adjacency structure under an ordering of students by residence hall. 

4-2. Logit blockmodel parameterization and fitting procedure 



Here we build on the results of ITraud et all (1201 ll) by taking covariate information explicitly 
into account when fitting the Facebook dataset described above. Specifically, by assuming only 
that links are independent Bernoulli variates and then employing confidence bounds to assess 
fitted blocks by way of parameter 0W, we examine these data for residual community structure 
beyond that well explained by the covariates themselves. 

Since the results of Theorems Q] and [2] hold uniformly over all choices of blockmodel mem- 
bership vector z, we may select z in any manner, including those that depend on covariates. 
For this example, we determined an approximate maximum likelihood estimate z under a logit 
blockmodel that allows the direct incorporation of covariates. The model is parameterized such 
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299 Fig. 2. Facebook social network dataset and its fitting statistics for varying number of blockmodel classes K. Left: 

300 Adjacency data matrix of a network of Facebook undergraduate student profiles. Center: Model order statistic for 

fitted logit blockmodels as a function of K. Right: Out-of-sample prediction error as a function of K 

302 

303 that the log-odds ratio of an edge occurrence between nodes i and j is given by 

304 ^ 

305 ]og —»^ = § +x (i,j^p (i = l,...,N;j = i + l,...,N), (4) 

306 1 - F U 

307 where x(i,j) a vector of covariates indicating shared group membership, and model parame- 

308 ters (9, (3, z) are estimated from the data. Four categorical covariates were used: the three indi- 

309 cated abo ve, plus an eight-category covariate indicating the range of the observed degree of each 

310 node; see lKarrer & Newmanl (1201 lh for related discussion on this point. Matrix 9 is analogous to 

311 blockmodel parameter 9, vector z specifies the blockmodel class assignment, and vector /3 was 

312 implemented here with sum- to-zero identifiability constraints. 

313 Because exact maximization of the log-likelihood function L(A; 9, (3, z) corresponding to dU) 

314 is computationally intractable, we instead employed an approach that alternated between Markov 

315 chain Monte Carlo exploration of z while holding (9, (3) constant, and optimization of 9 and /3 

316 while holding z constant. We tested different initialization methods and observed that highest 

317 likelihoods were consistently produced by first fitting class assignment vector z. This fitting 

318 procedure provides a means of estimating averages 9^ over subsets of the set {Pij}i<j, under 

319 the assumption that the network data comprise independent Bernoulli (Pij) trials. 
320 

321 4-3. Data analysis 

322 We fitted the logit blockmodel of (@]) for values of K ranging from 1 to 50 using the stochas- 

323 tic maximization procedure described in the preceding paragraph, and gauged model order by 

324 the Bayesian information criterion and out-of-sample prediction using five-fold cross validation, 

325 shown respectively in the center and rightmost panels of Fig. [2] These plots suggest a rela- 

326 tively low model order, beginning around K = 4. The corresponding 95% confidence bounds 

327 on the divergence of 9^ from 9^ provided by Theorem Q] also yield small values for K 

328 in the range 4-7: for example, when K = 5, the normalized sum of Kullback-Leibler diver- 

329 gences J2 a <b n abD{6ab II 6~ab) lS bounded by 0-0067. Corresponding normalized root- 

330 mean-square error bounds over this range of K are approximately one order of magnitude larger. 

331 We then examined approximate maximum likelihood estimates of z for K in the range 4-7, 

332 as shown in the top two rows of Fig.[3l larger values of K also reveal block structure, but exhibit 

333 correspondingly larger confidence bound evaluations. The permuted adjacency structures under 

334 each estimated class assignment z are shown in the top row, along with the corresponding values 

335 of 9 below in the second row. The structure of 9 over this range of K suggests that after covariates 

336 are taken into account, it is possible to identify a subset of students who divide naturally into 
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Fig. 3. Results of logit blockmodel fitting to the data of Fig.[2]for each of K G {4, 5, 6, 7} classes. Top row: Adja- 
cency structure of the data, permuted to show block assignments for K G {4, 5, 6, 7}. Second row: Corresponding 
estimates 8, with Kullback-Leibler divergence bounds 0-0057, 0-0067, 0077, and 0-0086. Bottom row: Residence 
hall assignments of students whose grouping remained constant over these four values of K 



two residual "meta-groups" that interact less frequently with one another in comparison to the 
remaining subjects in the dataset; the precision of the corresponding estimates 9 can be quantified 
by Theorem [TJ as in the caption of Fig. [3] 

As K increases, these groups become more tightly concentrated, as extra blocks absorb stu- 
dents whose connections are more evenly distributed. While the exact membership of each group 
varied over K, in part due to stochasticity in the fitting algorithm employed, we observed 199 
students whose meta-group membership remained constant. The bottom row of Fig. [3] shows the 
8 residence halls identified for these sets of students, with the ninth category indicating unre- 
ported; observe that the effect of residence hall is still visible in that the left-hand grouping has 
more students in halls 4-7, while the right-hand grouping has more students in halls 1, 2, and 8. 
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Appendix 

Proofs of Theorems\l]and\2\ 
Proof of Theorem\Tj To begin, observe that for any fixed class assignment z, every 6 a b is a 
sum of n a b independent Bernoulli random variables, with corresponding mean 0^. A Chernoff 
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385 bound (Dubhashi & Panconesi , 2009) shows 

3g7 pr(6 ab > 9 ab + t)< e -^(*-+*llH 0<t <l-9 ab 

388 w(0 ab < dab -t)< e-".^^-*!!*-*), < t < 9 ab . 
389 

3<jQ Since these bounds also hold respectively for pr(9 ab = 8 ab ± <), we may bound the probability of any 

3q , given realization ?9 G {0, 1/ 7i Q i,, . . . , 1} of a (, in terms of the Kullback-Leibler divergence of 9 ab from -d: 

392 pi(§ ab = 4)<e-**> D W\ e «>l 

394 By independence of the {Aij}i < j, this implies a corresponding bound on the probability of any 9: 
395 

396 pr(f?) < exp { - J2 a <b n ab D{9 ab \ \ 9 ab ) } . (Al) 
397 

398 Now, let denote the range of 9 for fixed z, and observe that since each of the ( t 1 ) lower-diagonal 

399 entries {9 ab } a < b of 9 can independently take on n ab + 1 distinct values, we have that |0| = n a <b( naf> + 

400 1). Subject to the constraint that J2 a <b Uah = (21)' we see tnat ^ s quantity is maximized when n ab = 

401 ( N ,)/( K 9 1 ) for all a < b, and hence " 
402 



101 < 



rn 



JV )/( K 2 +1 ) + lj ' 2 ' < (iV 2 /X 2 + 1)~ < (JV/lf + if +K . (A2) 

Now consider the event that J2 a <b n abD(9 ab \ \ 9 ab ) is at least as large as some e > 0; the probability 
4 q 7 of this event is given by pr(0 e ) for 

6 e = [9 G : Ea<b "ab^afa 1 1 hb) > £ } . (A3) 

Since J2a<b n abD(9 ab II 9 ab ) > e for all 9 G e , we have from ( IAU and ( IA3b that 

pr(0 e ) = ^ pr(0) < e-^<> n ^ Di§ ^ S " b) < ^ e~ e = |e e |e" e , 



9ee £ eee £ eee c 



408 
409 
410 
411 
412 
413 
414 

415 and since |0 e | < |0|, we may use ( IA2t to obtain, for fixed class assignment z, 

416 
417 
418 

419 Appealing to a union bound over all K N possible class assignments and setting e = 

420 log^ (N/K + l) K ' 2+K /S] then yields the claimed result. □ 
421 
422 



Pr {E a <b n abD(9 1 1 9) > e} < (N/K + lf 2+K (A4) 



Proof of Theorem^ By Lemma [T] the difference L(A; z) — Lp(z) can be expressed for any fixed 
class assignment z as J2a<b n abD(9 ab || 9 ab ) + X — E(X), where the first term satisfies the deviation 
bound of (lA4b . and X = J2i<j Aij l°g{^iz,,-/(l — ^ZjZj)} comprises a weighted sum of independent 
Bcrnoulli(P J j ) random variables. 

To bound the quantity \X — E(X)\, observe that since by assumption N~ 2 < Pij < 1 — iV~ 2 , 
the same is true for each corresponding average 9 ZiZj . As a result, the random variables Xij — 

427 \og{9 ZiZj /(I — 9 ZiZj )} comprising X are each bounded in magnitude by C = 2 log N . This allows 

428 us to a pply a Bernstein inequality for sums of bounded independent random variables due to lChunp & Lul 

429 d2006l Theorems 2.8 and 2.9, p. 27), which states that for any e > 0, 



430 

431 r, v Pf v^,^,_ e2 



p t{ |X- E (X)|> e} <2exp - 2£i<)g(x , ) + (2/3)eC ■ («) 
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Finally, observe that since the event \L(A; z) — Lp{z)\ > 2eM implies either the event 
Sa<b n abD(9 a i > 1 1 9 ab ) > eM or the event \X — E(X) \ > eM, we have for fixed assignment z that 

w{\L(A;z) - Lp(z) > 2eM} < pr [{ £ Q < b n ab D(6 ab \\ 9 ab ) > eikf} U {\X - E{X)\ > eAf] 

Summing the right-hand sides of (IA41 > and iA5i , and then over all K N possible assignments, yields 

prjmax \L(A; z) - L P (z)\ > 2eM} < exp {KfogN + (K 2 + K) \og{N/K + 1) - eM} 



2exp(/£TlogAT- ' l/ 



I 81og^iV + (4/3)elogiV. 

where we have used the fact that J2i<j E ( X ij) - 4Af log 2 in dASb . It follows directly that if K = 

OiN 1 ' 2 ) and M = uj{N{\og N) 3+s ), then lim^oo pr{max z \L(A; z) - L P (z)\/M > e} = for ev- 
ery fixed e > as claimed. □ 

Proof of Theorem\3\ 

Proof of Theorem\3\ To begin, note that Theorem|2]holds uniformly in z, and thus implies that 

\L P (z) - L(A; f)| + \L P (z) - L(A; z)\ = o P {M). 

Since z is the maximum-likelihood estimate of class assignment z, we know that L(A; z) > L(A; z), 
implying that L(A; z) = L(A; z) + 8 for some 6 > 0. Thus, by the triangle inequality, 

\Lp(z) -Lp(z) +6\ < \L P (z) - L(A;z)\ + \Lp(z) - (L(A;z) + S)\ = o P (M), 

and since Lp(z) > Lp(z) under any blockmodel with parameter z, we have Lp{z) — Lp{Z) = op(M). 
Under conditions (i) and (ii) of Theorem[3] we will now show that also 

Lp(z)-L P (z) = ^-Q(M), (A6) 

holds for every realization of z, thus implying that N c (z) = o P (N) and proving the theorem. 

To show (IA6t . first observe that any blockmodel class assignment vector z induces a corresponding 
partition of the set {Pij}i<j according to i-> (zi, Zj). Formally, z partitions {Pij}i<j into L subsets 
(Si , . . . , Sl) via the mapping 

Qj :(i=l,...,N;j = i+l,...,N)^(l = l,...,L). 

This partition is separable in the sense that there exists a bijection between {1,. . . ,L} and the upper 
triangular portion of blockmodel parameter 9, such that we write 9^. = 9 ZiZj for membership vector z. 
More generally, for any partition II of {Pij}i<j, we may define 9i — \Si\~ 1 22j<j Pij 1{-Py S 5/}as the 
arithmetic average over all P^ in the subset Si indexed by = I- Thus we may also define 

z p( n ) = E { p « lo s^ + C 1 - lo g(! - , 

i<j 

so that L* p and Lp coincide on partitions corresponding to admissible blockmodel assignments z. 

The establishment of ( IA6t proceeds in three steps: first, we construct and analyze a refinement of the 
partition Ii z induced by any blockmodel assignment vector z in terms of its error N c (z); then, we show 
that refinements increase Lp(-); finally, we apply these results to the maximum-likelihood estimate i. 

LEMMA 2. Consider a K -class stochastic blockmodel with membership vector z, and let IF denote 
the partition of its associated {Pij}i<i<j<N induced by any z £ {1, . . . , K } . For every IP, there exists 
a partition II* that refines Tl z and with the property that, if conditions (i) and (ii) of Theorem\3\hold, 



L P (z)-Z*(n*) = ^Mfi(l/). (A7) 
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481 where N e (z) counts the number of nodes whose true class assignments under z are not in the majority 

482 within their respective class assignments under z. 

483 

LEMMA 3. Let II' be a refinement of any partition LT of the set {Pij}i<j! then L* p (Tl') > L* p (Il). 

484 

485 Since Lemma [2] applies to any admissible blockmodel assignment vector z, it also applies to the 

486 maximum-likelihood estimate z for any realization of the data; each z in turn induces a partition 
4g7 IT of blockmodel edge probabilities {Pij}i<j, and JA7b holds with respect to its refinement IT*. By 
4gg LemmaO Z P (IT) < L P (H*). Finally, observe that L P {z) = Zp(IF) by the definition of L* P , and so 

Lp(z) — Lp(z) > Lp(z) - Zp(II*), thereby establishing dA6l . □ 

490 Proof of Lemma\2\ The construction of II* will take several steps. For a given membership class under 

49 1 z, partition the corresponding set of nodes into subclasses according to the true class assignment z of each 

492 node. Then remove one node from each of the two largest subclasses so obtained, and group them together 

493 as a pair; continue this pairing process until no more than one nonempty subclass remains, then terminate. 

494 Observe that if we denote pairs by their node indices as (i, j), then by construction = Zj but z,; ^ Zj. 

Repeat the above procedure for each class under z, and let C\ denote the total number of pairs thus 
formed. For each of the C\ pairs (i, j), find all other distinct indices k for which the following holds: 

497 „/ „Pik + Pik\ . ,,Pik + Pik\. „MK 



498 



509 
510 



D(P, II «*±&) + D( Pjk || ^±i>) >C^ (A8) 

where C is the constant from condition (ii) of Theorem[3] and indices ik and jk in dA8b are to be in- 

500 terpreted respectively as ki whenever k < i, and kj whenever k < j. Let C2 denote the total number of 

501 distinct triples that can be formed in this manner. 

502 We are now ready to construct the partition II* of the probabilities {Pij}i<i<j<N as follows: For each 

503 of the C2 triples (i, j, k), remove Pik (or Pki if fc < i) and Pjk (or Pf-f) from their previous subset assign- 

504 ment under IF, and place them both in a new, distinct two-element subset. We observe the following: 

505 (i) The partition II* is a refinement of the partition II 2 induced by z: Since nodes i and j have the same 

506 class label under z in that Zi = Zj, it follows that for any fc, P^ and Pjk are in the same subset under IF. 
5Q7 (ii) Since for each class at most one nonempty subclass remains after the pairing process, the number of 

pairs is at least half the number of misclassifications in that class. Therefore we conclude C\ > N c (z)/2. 

(iii) Condition (ii) of Theorem [3] implies that for every pair of classes (a, b), there exists at least one 
class c for which (IA8b holds eventually. Thus eventually, for any of the C\ pairs (i, j), we obtain a number 
of triples at least as large as the cardinality of class c. Condition (i) in turn implies that the cardinality of 
the smallest class grows as Q(N/K), and thus we may write Ci — C\ Q(N/K). 

512 We can now express the difference Lp(z) — Zp (II*) as a sum of nonnegative divergences D(Pij \\ 

513 Or* ), where £* is the assignment mapping associated to II*, and use ( IA8b to lower-bound this difference: 

514 " 

I" M«-iMin-Emil^)-cin(^)-^n(£). 

517 

518 Proof of Lemma\3\ Let II' be a refinement of any partition II of the set {P%j}i<j, and given a £ 

519 {1, ■ • ■ , L'} indexing S' a , let F(a) denote its index under II. We show that Zp(IT') > L* P (U) as follows: 
520 
521 
522 
523 
524 
525 



L' 

Zp(n') = K\{e' a iog^ + (1 - K) iog(i - K)} 



a=l 
L' 

> 



£ \s' a \{o' a io g e F(a) + (1 - e' a ) io g (i - e F[a) )) 

a=l 

526 l 

527 =^|5 b |{^log^ + (l-^)log(l-^)} = Z P (n), 

528 6=1 
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